Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Algorithms for Binary Neural Networks

Omitting superscript ·^t, we have the i-th component of ^∂^A

∂w ^as

( ^∂^A

∂w ⁾ⁱ⁼

⎡

⎢⎢⎢⎢⎣

...

∂Ai,i

∂wi,1

...

∂Ai,i

∂wi,j

...

∂Ai,i

∂wi,J

...

⎤

⎥⎥⎥⎥⎦

(3.126)

we can derive

w ^ˆG(w, A) =

⎡

⎢⎢⎢⎢⎣

w1ˆg1

...

w1ˆgi

...

w1ˆgI

wIˆg1

...

wIˆgi

...

wIˆgI

⎤

⎥⎥⎥⎥⎦

(3.127)

Combining Eq. 3.126 and Eq. 3.127, we get

w ^ˆG(w, A)( ^∂^A

∂w ⁾ⁱ⁼

⎡

⎢⎢⎢⎢⎢⎢⎣

w1ˆgi

∂Ai,i

∂wi,1

...

w1ˆgi

∂Ai,i

∂wi,j

wiˆgi

∂Ai,i

∂wi,1

...

wiˆgi

∂Ai,i

∂wi,J

wIˆgi

∂Ai,i

∂wi,1

...

wIˆgi

∂Ai,i

∂wiJ

⎤

⎥⎥⎥⎥⎥⎥⎦

(3.128)

After that, the i-th component of the trace item in Eq. 6.72 is then calculated by:

Tr[w ^ˆG( ^∂^A

∂w ⁾ⁱ^{] =}^wⁱ^ˆ^gⁱ

j=1

∂Ai,i

∂wi,j

(3.129)

Combining Eq. 6.72 and Eq. 3.129, we can get

ˆw^t⁺¹= w^t⁺¹−η2λ

⎡

⎢⎢⎢⎢⎢⎢⎣

ˆg^t

j=1

∂A^t

1,1

∂w^t

1,j

ˆg^t

j=1

∂A^t

I,I

∂w^t

I,j

⎤

⎥⎥⎥⎥⎥⎥⎦

⊛

⎡

⎢⎢⎢⎢⎣

w^t

⎤

⎥⎥⎥⎥⎦

= w^t⁺¹+ η2λd^t⊛w^t,

(3.130)

where η2 is the learning rate of the real value weight ﬁlters wi, ⊛denotes the Hadamard

product. We take d^t= −[ˆg^t

j=1

∂A^t

1,1

∂w^t

1,j ^,^{· · ·}^,^ˆ^g^t

j=1

∂A^t

i,i

∂w^t

I,j ^]^T^{, which is unsolvable and un-}

deﬁned in the backpropagation of BNNs. To address this issue, we employ a recurrent model

to approximate d^tand have

ˆw^t⁺¹= w^t⁺¹+ U ^t◦DReLU(w^t, A^t),

(3.131)

and

w^t⁺¹←ˆw^t⁺¹,

(3.132)

where we introduce a hidden layer with channel-wise learnable weights U ∈R^C^out

to recur-

rently backtrack the w. We present DReLU to supervise such an optimization process to

realize a controllable recurrent optimization. Channel-wise, we implement DReLU as

DReLU(wi, Ai) =

if (¬D(w^′

i⁾⁾^∧^D⁽^Aⁱ^{) = 1}^,

otherwise,

(3.133)